Prediction of actual loads from fare collection data

ABSTRACT

A system and method are provided for predicting passenger loads at vehicle stops on a transportation network. The method includes providing a classifier which has been trained to predict passenger loads at vehicle stops on a transportation route, based on reconstructed passenger loads for vehicle stops on the route. Transaction data is acquired for passengers boarding at vehicle stops on the transportation route. Reconstructed passenger loads are computed for vehicle stops on the route based on the transaction data. With the trained classifier, a passenger load for at least one of the vehicle stops on the transportation route is predicted, based on the reconstructed passenger load for the vehicle stop.

This application claims the priority of EP Application EP17306254, filed Sep. 22, 2017, entitled GOAL-BASED TRAVEL RECONSTRUCTION, by Joseph Rozen, et al. and EP Application EP17306253, filed Sep. 22, 2017, entitled PREDICTION OF ACTUAL LOADS FROM FARE COLLECTION DATA, by Sofia Zaourar Michel, et al., the disclosures of which are incorporated herein by reference in their entireties. Cross-reference is made to copending application Ser. No. 15/788,334, filed Oct. 19, 2017, entitled GOAL-BASED TRAVEL RECONSTRUCTION, by Joseph Rozen, et al., the disclosure of which is incorporated herein by reference in its entirety.

BACKGROUND

The exemplary embodiment relates to transportation planning and finds particular application in a system and method for prediction of passenger load on a transportation network.

In transportation networks, information on passenger load between stops can be used by transportation planners to determine whether the transportation network is meeting the needs of its users and to modify schedules, vehicle capacity, and the like to provide an appropriate level of service in a cost-efficient manner.

Traditionally, information on passenger load was obtained manually, by positioning observers on vehicles or at stops or by use of household surveys or roadside interviews. However, this is an expensive and time-consuming method for collecting data. More recently, vehicles have been provided with the ability to collect passenger information, for example through automatic ticketing validation (ATV) systems. However, such systems tend to give an incomplete picture of passenger occupancy of a vehicle. Many such systems generally detect only the boarding of passengers, for example, when passengers swipe an ATV machine with a ticket. Alightings on the other hand usually have to be inferred from subsequent validations for travelers, or tracked by anonymized ticket numbers, during a day of operations. A set of heuristics may be used, which take into account that some passengers make multi-leg journeys-alighting at one stop and boarding at another before completing their journey and that passengers generally return to the origin of their first trip of the day at the end of the day.

To estimate the number of passengers traveling between any two points on the network, methods have been developed, as described for example, in U.S. Pub. No. 20150186792. The information may be stored in a matrix.

However, such methods have been found to underestimate passenger load in some cases. For example, some passengers may forget to validate their tickets or fail to purchase a ticket. As a result, overcrowding may occur on some vehicles, leading to passenger dissatisfaction and potential loss of revenue through loss of ridership. This is compounded by changes in traveler behavior over time. For example, the number of passengers on a particular vehicle may vary over the course of a day, the number of travelers on the network may vary over the course of a week, the number of travelers at a particular stop may vary over the course of hours, and the like.

There remains a need for a system and method for more realistic prediction of actual loads from automatically-collected data.

INCORPORATION BY REFERENCE

The following references, the disclosures of which are incorporated herein in their entireties, by reference, are mentioned:

U.S. Pub. No. 20170206715, published Jul. 20, 2017, entitled LOCALIZATION OF TRANSACTION TAGS, by Remi Feuillette, et al.

U.S. Pub. No. 20170206201, published Jul. 20, 2017, entitled SMOOTHED DYNAMIC MODELING OF USER TRAVELING PREFERENCES IN A PUBLIC TRANSPORTATION SYSTEM, by Boris Chidlovskii.

U.S. Pub. No. 20170169373, published Jun. 15, 2017, entitled SYSTEM AND METHOD FOR MEASURING PERCEIVED IMPACT OF SCHEDULE DEVIATION IN PUBLIC TRANSPORT, by Frederic Roulland, et al.

U.S. Pub. No. 20170132544, published May 11, 2017, entitled METHOD AND SYSTEM FOR STOCHASTIC OPTIMIZATION OF PUBLIC TRANSPORT SCHEDULES, by Sofia Zaourar Michel, et al.

U.S. Pub. No. 20170109764, published Apr. 20, 2017, entitled SYSTEM AND METHOD FOR MOBILITY DEMAND MODELING USING GEOGRAPHICAL DATA, by Abhishek Tripathi, et al.

U.S. Pub. No. 20170053209, published Feb. 23, 2017, entitled SYSTEM AND METHOD FOR MULTI-FACTORED-BASED RANKING OF TRIPS, by Eric Ceret, et al.

U.S. Pub. No. 20160364645, published Dec. 15, 2016, entitled LEARNING MOBILITY USER CHOICE AND DEMAND MODELS FROM PUBLIC TRANSPORT FARE COLLECTION DATA, by Luis Rafael Ulloa Paredes, et al.

U.S. Pub. No. 20160123748, published May 5, 2016, entitled TRIP RERANKING FOR A JOURNEY PLANNER, by Boris Chidlovskii.

U.S. Pub. No. 20160033283, published Feb. 4, 2016, entitled EFFICIENT ROUTE PLANNING IN PUBLIC TRANSPORTATION NETWORKS, by Luis Rafael Ulloa Paredes.

U.S. Pub. No. 20150186792, published Jul. 2, 2015, entitled SYSTEM AND METHOD FOR MULTI-TASK LEARNING FOR PREDICTION OF DEMAND ON A SYSTEM, by Boris Chidlovskii.

U.S. Pub. No. 20140288982, published Sep. 25, 2014, entitled TEMPORAL SERIES ALIGNMENT FOR MATCHING REAL TRIPS TO SCHEDULES IN PUBLIC TRANSPORTATION SYSTEMS, by Boris Chidlovskii.

U.S. Pub. No. 20140201066, published Jul. 17, 2014, entitled SYSTEM AND METHOD FOR ENABLING TRANSACTIONS ON AN ASSOCIATED NETWORK, by Pascal Roux, et al.

U.S. Pub. No. 20140089036, published Mar. 27, 2014, entitled DYNAMIC CITY ZONING FOR UNDERSTANDING PASSENGER TRAVEL DEMAND, by Boris Chidlovskii.

U.S. Pub. No. 20130317884, published Nov. 28, 2013, entitled SYSTEM AND METHOD FOR ESTIMATING A DYNAMIC ORIGIN-DESTINATION MATRIX, by Boris Chidlovskii.

U.S. Pub. No. 20130317747, published Nov. 28, 2013, entitled SYSTEM AND METHOD FOR TRIP PLAN CROWDSOURCING USING AUTOMATIC FARE COLLECTION DATA, by Boris Chidlovskii, et al.

U.S. Pub. No. 20130317742, published Nov. 28, 2013, entitled SYSTEM AND METHOD FOR ESTIMATING ORIGINS AND DESTINATIONS FROM IDENTIFIED END-POINT TIME-LOCATION STAMPS, by Boris Chidlovskii.

U.S. Pub. No. 20130185324, published Jul. 18, 2013, entitled LOCATION-TYPE TAGGING USING COLLECTED TRAVELER DATA, by Guillaume M. Bouchard, et al.

U.S. Pub. No. 20090283591, published Nov. 19, 2009, entitled PUBLIC TRANSIT SYSTEM FARE PROCESSOR FOR TRANSFERS, by Martin Silbernagl.

U.S. application Ser. No. 15/151,773, filed May 11, 2016, entitled TRAVEL DEMAND INFERENCE FOR PUBLIC TRANSPORTATION SIMULATION, by Boris Chidlovskii.

BRIEF DESCRIPTION

In accordance with one aspect of the exemplary embodiment, a method for predicting passenger load includes providing a classifier which has been trained to predict passenger loads at vehicle stops on a transportation route, based on reconstructed passenger loads for vehicle stops on the route. Transaction data is acquired for passengers boarding at vehicle stops on the transportation route. Reconstructed passenger loads are computed for vehicle stops on the route based on the transaction data. With the trained classifier, a passenger load for at least one of the vehicle stops on the transportation route is predicted, based on the reconstructed passenger load for the vehicle stop.

One or more steps of the method may be performed with a processor.

In accordance with another aspect of the exemplary embodiment, a system for predicting passenger load includes a classifier which has been trained to predict passenger loads at vehicle stops on a transportation route, based on reconstructed passenger loads for vehicle stops on the route. A trip reconstruction component predicts alighting stops for passenger trips based on transaction data for passengers boarding at vehicle stops on the transportation route. A load reconstruction component computes reconstructed passenger loads for vehicle stops on the route, based on the boarding stops and the predicted alighting stops. A load prediction component uses the trained classifier to predict a passenger load for at least one of the vehicle stops on the transportation route, based on the reconstructed passenger load for the vehicle stop. A processor implements the trip reconstruction component, load reconstruction component, and load prediction component.

In accordance with another aspect of, the exemplary embodiment, a method for generating a system for predicting passenger load includes acquiring transaction data for passengers boarding at vehicle stops on a plurality of transportation routes in a transportation network and computing reconstructed passenger loads for the vehicle stops on the plurality of transportation routes, based on the transaction data. The method further includes acquiring count data for passengers boarding at the vehicle stops on the plurality of transportation routes and computing actual passenger loads for the vehicle stops on the plurality of transportation routes, based on the count data. A classifier is trained to predict passenger loads at vehicle stops on a transportation route in the transportation network. The training is based on the reconstructed passenger loads and actual passenger loads on the plurality of transportation routes. The trained classifier is stored in memory for predicting a passenger load for one of the vehicle stops on one of the transportation routes, based on a new reconstructed passenger load for the vehicle stop.

One or more steps of the method may be performed with a processor.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram of an environment in which a system for predicting passenger load operates;

FIG. 2 is a functional block diagram of an example load prediction system;

FIG. 3 is a flow chart illustrating a method for predicting passenger load;

FIG. 4 is a graph showing mean average error (MAE) in predicted passenger loads for trips made on a transportation route over the course of a month; and

FIG. 5 is a graph showing actual load (APCLoad), predicted load, and reconstructed load (FCLoad) at stops on one trip on a transportation route.

DETAILED DESCRIPTION

A system and method are described for predicting passenger loads on vehicles in a transportation network. The predicted loads are valuable to agencies balancing traveler satisfaction and operating costs. The predicted loads account for differences between reconstructed passenger loads, inferred from transaction data, and the actual passenger loads aboard the vehicles.

As used herein, the actual passenger load at a given stop is the number of passengers on board a transport vehicle and corresponds to the number of passengers on board prior to arrival at the stop plus the number of passengers boarding at the stop, minus the number of passengers alighting at the stop. In some cases, the actual passenger load may be an aggregate or average value, computed over several trips made by the same or different transport vehicles on the same transport route.

With reference to FIGS. 1 and 2, a load computation system 10 analyzes transaction data 12 (sometimes referred to as fare collection (FC) data), which is received, directly or indirectly, from an associated transportation network 14. Using the transaction data, the system 10 generates predicted passenger load data 16 for one or more routes of the transportation network 14, or information based thereon. The transaction data 12 may be received by the system 10 from an intermediate data collection server (DCS) 18, which collects the data from various parts of the network and may process the data, for example, in order to collect fares for trips made by users of the network 14.

The transportation network 14 can be as described, for example, in one or more of above-mentioned U.S. Pub. Nos. 20170206715, 20170206201, 20170169373, 20170132544, 20170109764, 20170053209, 20160364645, 20160123748, 20160033283, 20140288982, 20140201066, 20140089036, 20130317884, 20130317747, 20130317742, 20130185324, and U.S. Ser. No. 15/151,773.

By way of example, the transportation network 14 includes multiple public transport vehicles 20, 22, etc., such as buses or trams. The vehicles travel on different routes 24, 26, etc. of the network 14, according to predefined schedules, to provide transportation services that are utilized by a large number of users, which may be referred to as passengers or travelers. Each route may include a set of predetermined stops 28, 30, 32, etc. (such as bus stops or tram stops), at fixed locations on the route, where passengers can board or alight from a vehicle during a given vehicle trip. Each vehicle trip may start at the first stop on the route, make stops at one or more intermediate stops, and end the trip at a final stop on the route. In some cases, the last stop of one trip may be the same as the first stop of the next trip.

Data collection units, such as automatic ticketing validation (ATV) devices 34, 36 on or associated with the vehicles collect transaction data 12 for each transaction with a passenger, which is sent to the data collection server 18 and/or the load computation system 10, for processing. The ATV device 34, 36 may include an RFID (Radio-frequency identification) transaction tag that collects transaction data 12 for travelers. In Each vehicle 20, 22, may include one or more ATV devices 34, 36, e.g., mounted in the passenger area of the vehicle or by the door where passengers enter or leave the vehicle. In some embodiments, ATV devices 34, 36 may be located at or near stops on the transportation route, e.g., on platforms or at entry gates. Transaction data 12, generated by the ATV devices 34, 36, is collected by the DCS 18, and may be used, in some cases, for invoicing passengers for their trips.

One or more of the vehicles 20 may be equipped with an automated passenger counter (APC) 38, which detects passengers entering and/or leaving the vehicle. Each detection of a person is associated with a respective time stamp. Count data 40 are sent from the APC 38 to the system 10, which links the counts to vehicle stops on a vehicle trip by comparing the time stamps of the counts and the observed stop times of the vehicle. The APCs may operate by using infrared lights above the doorways to the vehicle. A set of infrared beams are spaced so that the order in which the beam is broken by a person determines if they are entering or exiting the vehicle. Alternatively, CCTV cameras can be used, together with intelligent people counting software, to log numbers of people getting on and off at each stop.

Not all vehicles on a route or on the network need to be equipped with an APC 38. APC systems can be costly to maintain and operate. Equipped vehicles may perform many trips along the various routes of the network and the collected data may be digested and analyzed before passenger loads are available. In other embodiments, the count data are obtained by people assigned to count the number of passengers entering and leaving the vehicle at each stop. The count data 40 for each of a set of stop times are then sent to the system. The people performing the counting may be located on the vehicles or positioned at the stops.

In the present system and method, the transaction data 12 is used to predict the passenger load 16 at or between one or more stops on the route(s) 24, 26, in particular, for stops where no count data 40 is available, by learning a relationship between the observed count data 40 and the reconstructed load data 42 inferred from the transaction data 12.

The predicted loads 16 can differ from the reconstructed loads 42, generated from the transaction data 12. The difference may be for a variety of reasons: some travelers may be allowed to board without checking in, for example, because they hold passes that allow them to travel without tickets. Others may forget to check in or intentionally avoid checking in. Some passengers may use single use or other tickets, such as paper tickets, that are not observed by the system.

The automatic ticketing validation (ATV) devices 34, 36 and counters 38 may supply the transaction data 12 and count data 40 by wired or wireless connection 44 to the DCS 18 and/or system 10. Various methods of transferring the data 12, 40 are contemplated, such as via a wide area network, such as the internet, by using users' smart devices 46 as relay devices (see, for example, U.S. Pub. No. 20140201066), via a local area network or direct connection when the vehicle has returned to its base location, or via short range communication when the vehicle is within range of a fixed communication device 48, which may be positioned at or near one or more of the stops 28, 30, 32, etc. The fixed communication device 48 may transmit data wired or wirelessly to the system 10. As will be appreciated, the system and method are not limited to the method(s) used to collect and transmit the data 12, 40 and more than one method may be used.

In one embodiment, at least some users of the transportation network preregister with the DCS 18, or with other registration system. The user is issued with an electronic ticket 50, which allows the user to take trips on the network and to pay for the trips, e.g., by deduction of a respective amount from a stored value on the card or by billing the user's credit card. The electronic ticket 50 may be in the form of a physical card, e.g., with an RFID tag, or may be in the form of a software application on the smart device 46, that is equipped with a short range communication device. When the user swipes the ATV device 34, 36 with the smartcard or smart phone 46, the ATV device determines whether the ticket is valid for travel and, if validated, generates transaction data 12 for a transaction.

In some cases, a unique identifier (user ID) 52 is associated with each of the registered users, that may be stored on the electronic ticket 50. Other users of the network may use a stored value ticket 50, which allows a user to add value anonymously to the ticket. The ticket 50 may have a unique ID 52 which is inferred to be associated with trips made by the ticket holder and can be treated in the same manner as a User ID for analysis purposes.

In some embodiments, the automatic ticketing validation (ATV) device 34 and Automated Passenger Counter (APC) 38 may be incorporated into a common data collection unit on the vehicle which outputs the count data and transaction data.

Each time a user swipes an ATV device 34 with his or her electronic ticket 50, information passes between the electronic ticket 50 and the ATV device, e.g., by sort range communication, such as Near Field Communication (NFC), and if the electronic ticket 50 is valid (e.g., for the route and/or time of day), transaction data 12 is generated. The transaction data 12 may include some or all of the ticket/user ID 52, a time stamp of the transaction 54, location data 56, and a vehicle/ATV identifier 56. The data 12 output is sufficient to identify the stop 30 on the scheduled route where the respective passenger boarded the vehicle.

The location data 56 may be generated using an Automated Vehicle Location (AVL) device 60 onboard the vehicle 20. In one embodiment, the AVL device includes a GPS system onboard the vehicle, which communicates with a satellite 62 to identify the vehicle's location at each stop. Alternatively, the AVL device acquires location information from the fixed beacons 48 at the stops, which each transmits a respective location to the vehicle, when within communication range. Such a system is described, for example, in U.S. Pub. No. 20170206715, published Jul. 20, 2017, entitled LOCALIZATION OF TRANSACTION OF TAGS, by Remi Feuillette, et al. In other embodiments, where ATV devices 34, 36 are fixed in location, along the route, the location 56 of the ATV device may be associated with the electronic ticket 50 when the user swipes the ATV device with the ticket. In some embodiments, where the system receives no location data from the network 14, the location data 56 may be generated by the system 10, e.g., by comparing the transaction time stamp 54 with the scheduled stop times at each stop for the route, which may be stored in a route schedule 64.

The time stamp 54 may be generated by a clock which is part of or associated with the ATV device 34. The clock may be periodically synchronized with accurate time information provided by the fixed beacons 48 at the stops, which each transmits a respective time signal to the vehicle, when within communication range. The AVL device 60 may also include a clock which provides time stamps for each of the stops. The AVL clock may be synchronized periodically with the ATV and/or beacon clock(s). The transaction time stamps 54 may differ from the stop times provided by the AVL device 60, depending on whether the transaction takes place before or after boarding.

The vehicle/ATV ID 58 may be used to associate the vehicle with the other transaction data 12 and the count data 40.

The count data 40 may be computed by the APC 38, by maintaining a running count of the total number of passengers on board the vehicle, based on detected movements of people onto and off the vehicle. In other embodiments, the count data 40 may simply be raw data on the passenger detections that is collected and sent to the system 10. In some embodiments, each passenger entry/departure may be associated with a time stamp. The APC count data 40 thus allows the actual passenger load 66 (APC data) at a given stop to be determined.

With continued reference to FIG. 2, the load computation system 10 includes memory 70, which stores instructions 72 for performing the exemplary method, and a processor 74, in communication with the memory, for executing the instructions. In particular, the processor 74 executes instructions for performing the method outlined in FIG. 3. The processor may also control the overall operation of the computer system 10 by execution of processing instructions which are stored in memory 70. The computer-implemented computer system 10 also includes one or more of input/output (I/O) interface(s) 76, 78 for communicating with external devices, such as DCS 18 and/or the network 14, e.g., via a link 80, such as a wired or wireless network, such as the Internet.

The input/output interface 76 receives as input the transaction (FC) data 12 and count data 40 for one or more routes of the network for each of a set of vehicle trips. The input/output interface 78 may output the predicted passenger load data 16, or information based thereon, to an output device 82, such as a display device, a printer, a client computing device, incorporating one or more of a computer incorporating memory and a processor, a display device for displaying information to users, and a user input device for inputting text and/or for communicating user input information and command selections to the processor, which may include one or more of a keyboard, keypad, touch screen, writable screen, and a cursor control device, such as mouse, trackball, or the like. The various hardware components 70, 74, 76, 78 of the system 10 may be all connected by a bus 84. The system 10 may be hosted by one or more computing devices, such as the illustrated server computer 86.

The computer system 10 may include one or more of a PC, such as a desktop, a laptop, palmtop computer, portable digital assistant (PDA), server computer, cellular telephone, tablet computer, pager, combination thereof, or other computing device capable of executing instructions for performing the exemplary method.

The memory 70 may represent any type of non-transitory computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, or holographic memory. In one embodiment, the memory 70 comprises a combination of random access memory and read only memory. In some embodiments, the processor 74 and memory 70 may be combined in a single chip. The input/output (I/O) interface 76, 78 allow the computer to communicate with other devices via a computer network, such as a local area network (LAN) or wide area network (WAN), or the internet, and may comprise a modulator/demodulator (MODEM) a router, a cable, and and/or Ethernet port. Memory 70 stores instructions for performing the exemplary method as well as the processed data, such as actual passenger load data (APC data) 66 and predicted passenger load data 16.

The digital processor 74 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like.

The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth. Such software may be organized in various ways, and may include software components organized as libraries, Internet-based programs stored on a remote server or so forth, source code, interpretive code, object code, directly executable code, and so forth. It is contemplated that the software may invoke system-level code or calls to other software residing on a server or other location to perform certain functions.

The software instructions 72 may include various components for implementing parts of the method. In some embodiments, some or all of the software components may be wholly or partly resident on the client device 82 and/or DCS 18.

The illustrated software instructions include a passenger trip reconstruction component 90, an FC load reconstruction component 92, an APC load computation component 94, a training component 96, a load prediction component 98, an optional network modification component 100, and an output component 102.

The passenger trip reconstruction component 90 computes boarding and alighting data 102 for each route 24, 26 in the network 14 (or for at least some routes), based on the transaction data 12. The boarding and alighting data 102 includes the number of boardings at each stop along each route for a given trip and the predicted number of alightings at each stop, based on the transaction (FC) data.

The trip reconstruction component 90 first processes the transaction data 12 to associate each recorded transaction with a boarding stop on a given route during a given vehicle trip. The method of identifying the boarding stop may vary, depending on the information provided in the transaction data 12. For example, in some cases, e.g., when the ATV device 34 is at a fixed location, e.g., at the stop location, the stop location may be included in the transaction data for the passenger's trip. In other embodiments, the boarding stop may be inferred from the time stamp 54 of the transaction and/or associated location data 56. For example, observed stop times for vehicles on a route can be compiled from the recorded transaction data 12 and/or extracted from the AVL 60 data. These observed stop times can be grouped into a sequence of stop times which are aligned to theoretical stop times for the scheduled trip, which are available from the route schedule 64. The planned transportation services for each route in the network 14 may be described in publicly-distributed General Transit Feed Specification (GTFS) files, which can be used or adapted for use as the route schedule 64. Each passenger boarding can thus be associated with a particular stop and a theoretical stop time.

Since the alightings are not generally available from the network 14, the trip reconstruction component 90 applies a set of heuristics 104 to the transaction data 12, collected for each passenger throughout a day, or over other suitable time period, to predict an alighting associated with each identified passenger boarding. The passenger is inferred to be the same if the same user ID or ticket ID 50 is used in more than one transaction on the network. Example heuristics 104 may include some or all of:

1. Users return to their previous trip's destination stop for their next trip, or to the nearest stop on the next route, i.e., users alight at the closest stop from the next boarding stop in their day's history.

2. Users generally have a symmetric use of the network, i.e., the first boarding of the day is the final destination of the same day (or the previous day), assuming this is possible, given the route selected for the last boarding. A day may be defined as a 24 hour period starting at midnight, or at some other predefined time.

3. For users making only one boarding in the day, the alighting may be assumed to follow have an alighting distribution which is the same as users with multiple boardings (or alternatively, these boardings may be ignored).

4. Alighting stops for travelers without electronic tickets 50, such as those using single journey tickets (including multi-trip tickets) follow a similar distribution to those using smart cards in their boarding and alightings.

Methods for applying heuristics to link boardings with alightings on the same route are described, for example in above-mentioned U.S. Pub. No. 20130317884 and U.S. Pub. No. 20130317742, and copending application Ser. No. 15/______, filed ______, 2017, entitled GOAL-BASED TRAVEL RECONSTRUCTION, by Joseph Rozen, et al. Other heuristics may be used in addition to or in place of these. The aim is to associate each passenger boarding stop, identified from the transaction data, with a respective alighting stop on the same route and vehicle trip. The boarding and alighting data may be aggregated over the passengers on the same vehicle trip to identify the number of passenger boardings and alightings at each stop. In some cases, the data may be aggregated over several days, e.g., for the same scheduled trip time, and optionally then averaged to provide average daily numbers of passenger boardings and alightings.

The FC load reconstruction component 92 infers a reconstructed passenger load 42 between a pair of stops, on a given service along a route, based on the boarding and alighting data 102. This may include, for each stop in sequence along the route, starting with the first, computing the reconstructed passenger load (FC load) as a sum:

Passenger load=number of passengers already on board+number of passengers boarding−number of passengers alighting   (1)

For the first stop, the number of passengers on board may be assumed to be zero, unless the route is a continuous route in which passengers can enter and leave at any stop. Similarly, for the last stop, only alighting may be permitted, so the passenger load may be set to zero, except in the case of the continuous route.

The APC load computation component 94 computes the actual (APC) passenger load 66 in a similar manner, but in this case, using the boarding and alighting counts in the count data 40. As noted, the APC data may be only available for a limited number of stop times for a given day, allowing vehicles equipped with an APC device 38 to rotate around the network to collect counts for the different routes during the month.

The training component 96 trains a classifier 108 for predicting the passenger load data 16. The training component 96 learns parameters of a classifier model with the aim of minimizing, over all the training data 66, 42, the difference between each reconstructed passenger load 42 and the respective passenger load 66 for each stop on the route. The classifier may include one or more classifier models 110, 112, etc. In an example embodiment, a classifier sub-model 110 is trained for each route 24, 26, if sufficient training data is available. A global classifier model 112 may be trained on training data for a number of routes, such as two, three, or more routes. The various sub-models 110, 112 are combined into a single hierarchical classifier model 108.

The classifier model(s) may each be a (multiple) linear regression classifier which is trained to optimize (e.g., minimize), over the training set, a loss function which takes as input the actual passenger load 66 and the respective reconstructed passenger load 42 (and/or one or more other factors) for each of a set of vehicle stops on the route or entire network. In the training, parameters θ_(n) are learned for weighting each of the factors F_(n), where n is the number of factors. The loss to be minimized is thus a function, over a set of N training samples, of a difference between the actual passenger load APCL and a predefined function ƒ of the set of parameter-weighted factors θ_(n) F_(n). The function ƒ can simply be the sum of the set of parameter-weighted factors, can include additional terms, such as a bias term, or may be a more complex function. The parameters θ_(n) can be scalar values or drawn from probability distributions (which may be quantized into a set of bins).

As will be appreciated, rather than minimizing a loss function, a function which maximizes a similarity between the APCL and the predicted passenger load computed with function ƒ.

The classifier model(s) can be trained using, as input, the actual load data 66 for a given stop and some or all of the following factors F_(n):

1. FC load: the reconstructed passenger load for the given vehicle stop on the transportation route.

2. FC load at the previous stop time: the reconstructed passenger load for the immediately previous vehicle stop on the transportation route (which may be set to zero when the given vehicle stop is the first stop on the route).

3. FC boarding: an estimated number of passengers boarding at the given vehicle stop on the transportation route, extracted from the transaction data for the given stop.

4. Aggregated FC boardings for the vehicle trip: an aggregate of the estimated numbers of passengers boarding the vehicle at all the vehicle stops on the route.

5. FC load squared, or other non-linear function of the reconstructed passenger load for the given vehicle stop on the transportation route (to capture some residuals).

In general, at least two or at least three, or at least four of these factors F_(n) (predictors) are used. The classifier training includes learning a respective parameter (or set of parameters) θ_(n) for each of the factors. As an example, the classifier models are trained by Bayesian hierarchical multiple linear regression. See, for example, Mallick, et al., “Bayesian Methods for High Dimensional Linear Models,” J Biom Biostat, 1, pp 1-27, 2014; Raudenbush, et al., “Hierarchical linear models: Applications and data analysis methods,” Vol. 1, Sage, 2002; Albert, et al., “Bayesian Analysis of Binary and Polychotomous Response Data,” J. Am. Statistical Assoc., Vol. 88, No. 422, pp. 669-679 (June 1993). The Bayesian hierarchical multiple linear regression classifier uses parameters drawn from a distribution, which may be learned as part of the training. In another embodiment, fixed value parameters are used, with the same set of factors. The hierarchical nature of the exemplary classifier model allows a grouping of the data by route, to learn route-specific parameters θ_(n). In the classifier model, the route specific parameters of each group are not learnt individually. Their values are constrained by a normal distribution which is learned against the global model 112. As an example, the implementation of PyStan (https://pystan.readthedocs.io/en/latest/), which relies on Stan software (http://mc-stan.org/), can be used to generate and implement the Bayesian hierarchical multiple linear regression classifier.

The classifier model 108 is thus trained to take, as input, reconstructed passenger load data 42 and predict the passenger loads 16 for the vehicle stops on a route, when no (or insufficient) actual load data is available for the trip.

The load prediction component 98 uses the trained classifier 108 to predict passenger loads 16 for one or more of the scheduled vehicle stops on a vehicle trip (or for an aggregated set of vehicle trips on the same route), based on the reconstructed passenger load data 42.

The optional proposal component 100 may use the predicted load data to flag potential problems in the network 14. For example, if the passenger load meets or exceeds a predetermined threshold for one or more stops of a vehicle trip, the proposal component 100 may output an alert. Or, it may propose changes to the network, such as adding extra vehicle trips, where a computed cost-benefit analysis suggests this would be advantageous. The cost-benefit analysis may take into account the utility to passengers of not traveling on overcrowded vehicles and, in some cases, being unable to board the vehicle.

The output component 102 outputs information, such as the predicted passenger load data 16 or information generated therefrom, such as alerts/proposals generated by the network modification component 100.

As will be appreciated, the training component 96 may be part of a separate system or removed from the system 10 when training is complete.

With reference now to FIG. 3, a method for computing passenger load which may be performed with the system of FIGS. 1 and 2 is shown. The method starts at S100.

At S102, transaction data 12 and count data 40 are acquired from a transportation network, which is to be used in training the classifier 108. The data 12, 40 may be stored in memory 70 during processing.

At S104, the transaction data 12 is processed, by the trip reconstruction component 90, using the heuristics 104, to generate boarding and alighting data 102 for passenger trips, which may be stored in memory 70 during processing. This may include predicting alighting stops on the route for the detected passengers, based on other (earlier and/or later) boarding stops on the same or other routes of the transportation network detected for at least some of the detected passengers.

At S106, the boarding and alighting data 102 is processed, by the load reconstruction component 92, to generate reconstructed passenger load data 42, which may be stored in memory 70 during processing.

At S108, the count data 40 is processed, by the load computation component 94, to generate actual passenger load data 66, which may be stored in memory 70 during processing.

At S110, the classifier 108, is trained, by the training component 96, with training data composed of the actual passenger load data 66 and the corresponding reconstructed passenger load data 42 and/or factors derived therefrom. The learned parameters of the classifier model(s) 110, 112, 108 may be stored in memory.

At S112, new transaction data 120 is received by the system, this may be acquired for a vehicle trip for a same or for different vehicle route for which the training transaction data 12 is acquired, but is generated in the same way.

At S114, step S104 is repeated, with the new transaction data 120.

At S116, step S106 is repeated, with the boarding and alighting data 102 generated at S114 from the new transaction data 120, to obtain reconstructed passenger load data 42.

At S118, the passenger load data 42 and/or factors generated therefrom, is input to the classifier 108, by the load prediction component.

Optionally, at S120, a determination may be made as to whether the predicted passenger load data computed at S118 exceeds a threshold and, if so, an alert may be output or the load data otherwise flagged. Alternatively or additionally, an expert user may review the predicted passenger load data after it has been output by the system (S122).

The method ends at S122.

The method illustrated in FIG. 3 may be implemented in a computer program product that may be executed on a computer. The computer program product may comprise a non-transitory computer-readable recording medium on which a control program is recorded (stored), such as a disk, hard drive, or the like. Common forms of non-transitory computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, or any other non-transitory medium from which a computer can read and use.

Alternatively, the method may be implemented in transitory media, such as a transmittable carrier wave in which the control program is embodied as a data signal using transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like.

The exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, Graphical card CPU (GPU), or PAL, or the like. In general, any device, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIG. 5, can be used to implement the prediction method. As will be appreciated, while the steps of the method may all be computer implemented, in some embodiments one or more of the steps may be at least partially performed manually. As will also be appreciated, the steps of the method need not all proceed in the order illustrated and fewer, more, or different steps may be performed.

The present method enables agencies to visualize loads aboard vehicles, with credible intervals.

As an illustrative example of the method, suppose that training data is acquired for Route 1 in FIG. 1, which includes a sequence of five stops 28, 30, etc., that are identified by the numbers from 1 to 5 (although street names or locations may be commonly used for the names of stops). TABLE 1 shows the type of data which may be obtained and generated. FC Boarding is the number of passengers boarding at each stop, as determined from the fare collection data 12 for a vehicle trip (or average of a set of vehicle trips) along the route. FC Alighting is the number of passengers inferred to be alighting at each stop for the vehicle trip, as based on analysis of the fare collection data 12 for passengers on the entire network, using the heuristics. FC Load is the reconstructed passenger load, which in this case is obtained by applying Equation 1 for each stop in sequence, using the respective FC Boarding and FC Alighting values. APC Boarding and APC Alighting are the counts obtained from the APC sensor 38. APC Load is the actual passenger load obtained by applying Equation 1 for each stop in sequence, using the respective APC Boarding and APC Alighting values. Predicted load values are illustrative of values (rounded to the nearest integer) which could be obtained by training a classifier model on the FC Load and APC Load.

TABLE 1 Stop 1 2 3 4 5 FC Boarding 4 3 7 2 0 FC Alighting 0 0 2 5 9 FC Load 4 7 12 9 0 APC Boarding 5 3 9 2 0 APC Alighting 0 0 3 4 12 APC Load 5 8 14 12 0 Factors used in 4, 4, 0, 7, 3, 4, 12, 7, 7, 9, 2, 12, 0, 0, 9, prediction: FC Load, 16, 16 16, 49 16, 144 16, 81 16, 0 FC Boarding, FC Load- prev stop, Aggregated FC boarding, FC Load squared Predicted Load 6 8 13 12 1

Without intending to limit the scope of the exemplary embodiment, the following example illustrates application of the method to a public transport network in a French city.

EXAMPLE

Data was obtained from three operational sources: transaction data from ATV devices, AVL data, and APC data to reconstruct loads for vehicles in a Public Transport network and then to predict actual loads from observed ones.

Similar models, though with different predictors, could be trained to predict total boardings and/or alightings.

To reconstruct the passenger loads aboard vehicles from transaction data 12, transaction events are extracted from the transaction data, normalized, filtered and linked to known stops on the Public Transportation network of a city, available in a set of GTFS files. These transactions are linked to observed stop times, obtained from the AVL system, where available. Individual boardings and alightings computed from the transaction data are aggregated to generate FC boardings, alightings and load counts for these stop times.

Actual boardings, alightings and loads observed by the APC system can then enrich the stop times. When there are differences between the AVL and APC clocks aboard a vehicle for a given day, the data from the two is first synchronized. This includes computing a time difference between the two series of data by computing all the time differences, getting rid of outliers, and averaging the time difference for the remaining ones. The average time difference is then applied to the APC time stamps. If needed, APC stop times with boardings at the end of routes are split into two distinct stop times, one with the alightings to end the trip and one with the boardings to start a new one.

Where the synchronized APC time stamps are close to AVL stop times (e.g., 120 seconds) the AVL stop times are enriched with the APC data. In this way, some of the stop times are enriched with actual loads computed from the APC data.

Counts reconstructed through the fare collection data are different from those obtained from the APC systems. In the transportation network of the city studied, some travelers use paper tickets that are not visible to system. It is also possible that some of the differences may be due to inadvertent or intentional misuse of the ATV devices. Additionally, APC data is only available for a limited number of stop times for a given day, as equipped vehicles rotate to collect counts for the routes during the month.

A classifier model is trained to predict total loads using the reconstructed and actual load data, where available. A Bayesian hierarchical multiple linear regression classifier model was trained, with a grouping on route and the following predictors: FC load, FC load at the previous stop time, FC boardings, aggregated FC boardings for the trip and FC load squared to capture some residuals.

FIG. 4 shows mean absolute error (MAE) for predicted loads on trips made on a route in a given month. The results show a global mean absolute error of less than 4 for the load predictions, using automated counts from onboard APC devices on selected vehicles for the classifier training. As can be seen, one trip is a substantial outlier, with an MAE of greater than 10, suggesting the benefit of aggregating trip data over several days.

FIG. 5 is an example trip of one route on the transportation network, the dark shading showing the credible interval at 0.5 probability and the lighter shading showing the credible interval at 0.9 probability. As can be seen from FIG. 5, predicted loads are almost always closer to the actual load (APCLoad) than the reconstructed load (FCLoad) for each stop,

The results suggest that intelligent transportation systems, which generally include multiple vehicles, routes, and services that are utilized by a large number of users and that include automatic ticketing validation systems that collect validation information for travelers, should benefit from the present system and method. Administrators can improve management and planning of such transportation systems, such as by adding additional routes, increasing the number of buses or trains on a route, increasing the size of facilities (bus stops, train stations, etc.), and the like.

It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims. 

What is claimed is:
 1. A method for predicting passenger load comprising: providing a classifier which has been trained to predict passenger loads at vehicle stops on a transportation route, based on reconstructed passenger loads for vehicle stops on the route; acquiring transaction data for passengers boarding at vehicle stops on the transportation route; with a processor, computing reconstructed passenger loads for vehicle stops on the route based on the transaction data; and with the trained classifier, predicting a passenger load for at least one of the vehicle stops on the transportation route, based on the reconstructed passenger load for the vehicle stop.
 2. The method of claim 1, wherein the classifier is trained with actual passenger loads for respective vehicle stops on the route and a plurality of factors, the plurality of factors including the reconstructed passenger load for a given vehicle stop on the transportation route.
 3. The method of claim 2, wherein at least one of the plurality of factors is selected from: the reconstructed passenger load for a previous vehicle stop on the transportation route; an estimated number of passengers boarding at the given vehicle stop on the transportation route, the estimated number being based on the transaction data; an aggregate of estimated numbers of passengers boarding at vehicle stops; and a non-linear function of the reconstructed passenger load for the given vehicle stop on the transportation route.
 4. The method of claim 2, wherein the classifier is a Bayesian hierarchical multiple linear regression classifier.
 5. The method of claim 1, wherein the providing of the classifier comprises training the classifier.
 6. The method of claim 5, further comprising acquiring counts of passengers boarding and alighting at vehicle stops on at least one route of a transportation network and respective reconstructed passenger loads for the vehicle stops.
 7. The method of claim 1, wherein the acquiring transaction data for passengers boarding at vehicle stops on the transportation route comprises acquiring identifiers for electronic tickets used by at least some of the passengers boarding at the vehicle stops.
 8. The method of claim 1, wherein the computing reconstructed passenger loads comprises predicting alighting stops for the passengers, based on other boarding stops on routes of a transportation network for at least some of the passengers.
 9. The method of claim 1, wherein the computing reconstructed passenger loads comprises predicting alighting stops for the passengers, the predicting including applying a set of heuristics.
 10. The method of claim 1, wherein the computing reconstructed passenger loads comprises, for each vehicle stop in a sequence of vehicle stops on the route, computing a sum of a number of passengers already on board plus number of passengers boarding minus number of passengers alighting.
 11. The method of claim 1, further comprising outputting at least one of the predicted passenger load and information based thereon.
 12. A computer program product comprising a non-transitory recording medium storing instructions, which when executed on a computer, cause the computer to perform the method of claim
 1. 13. A system comprising memory which stores instructions for performing the method of claim 1 and a processor, in communication with the memory, for executing the instructions.
 14. A system for predicting passenger load comprising: a classifier which has been trained to predict passenger loads at vehicle stops on a transportation route, based on reconstructed passenger loads for vehicle stops on the route; a trip reconstruction component which predicts alighting stops for passenger trips based on transaction data for passengers boarding at vehicle stops on the transportation route; a load reconstruction component which computes reconstructed passenger loads for vehicle stops on the route, based on the boarding stops and the predicted alighting stops; a load prediction component which uses the trained classifier to predict a passenger load for at least one of the vehicle stops on the transportation route, based on the reconstructed passenger load for the vehicle stop; and a processor which implements the trip reconstruction component, load reconstruction component, and load prediction component.
 15. The system of claim 14, further comprising a training component which trains the classifier based on reconstructed passenger loads for vehicle stops in a transportation network which includes the transportation route and actual passenger load data based on counts of passenger boardings and alightings at the respective vehicle stops.
 16. The system of claim 15, further comprising an automated passenger load computation component which computes the actual passenger load data based on the counts of passenger boardings and alightings.
 17. The system of claim 14, further comprising an output component which outputs the predicted passenger load or information based thereon.
 18. The system of claim 14, wherein the trip reconstruction component predicts alighting stops for passenger trips using a set of heuristics.
 19. A method for generating a system for predicting passenger load comprising: acquiring transaction data for passengers boarding at vehicle stops on a plurality of transportation routes in a transportation network; computing reconstructed passenger loads for the vehicle stops on the plurality of transportation routes, based on the transaction data; acquiring count data for passengers boarding at the vehicle stops on the plurality of transportation routes; computing actual passenger loads for the vehicle stops on the plurality of transportation routes, based on the count data; training a classifier to predict passenger loads at vehicle stops on a transportation route in the transportation network, the training being based on the reconstructed passenger loads and actual passenger loads on the plurality of transportation routes; storing the trained classifier in memory for predicting a passenger load for one of the vehicle stops on one of the transportation routes, based on the a new reconstructed passenger load for the vehicle stop.
 20. A system comprising memory which stores instructions for performing the method of claim 19 and a processor, in communication with the memory, which executes the instructions. 